Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disk: Pass /dev/null to containers.attach stdin #59

Closed
wants to merge 1 commit into from

Conversation

ckyrouac
Copy link
Collaborator

@ckyrouac ckyrouac commented Jul 8, 2024

When using os.Stdin, the initial ssh connection pipe is broken.

When using os.Stdin, the initial ssh connection pipe is broken.

Signed-off-by: Chris Kyrouac <[email protected]>
@ckyrouac ckyrouac requested a review from cgwalters July 8, 2024 16:07
@ckyrouac
Copy link
Collaborator Author

ckyrouac commented Jul 8, 2024

@cgwalters This seems to behave the same as using os.Stdin on my machine. I don't see any progress bar either way though, so maybe I'm doing something wrong.

When using os.Stdin, the initial ssh connection is busted until sending a newline. This is the error:

ERRO[0110] Failed to write input to service: write unix @->/home/chris/.local/share/containers/podman/machine/qemu/podman.soc
k: write: broken pipe

@cgwalters
Copy link
Contributor

Argh, sorry for the regression here!

I don't see any progress bar either way though, so maybe I'm doing something wrong.

You need bootc built from git main with the changes from containers/bootc#655

This is effectively reverting #55 right? I think nil is equivalent to /dev/null.

Hmmm....this makes me think that what's going wrong here is we're not correctly detaching from the container? There's a whole lot going on in podman attach.go.

Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a git revert fedcf86b8796ec9f5fe23dfc748e81708744da6b is clearer but this is fine too

@cgwalters
Copy link
Contributor

Tangentially this code is IMO another great example of the superficial simplicity of Go being paid down later...trying to use nil for "unset value" except that comes back to bite with "Note that an interface value that holds a nil concrete value is itself non-nil. ". In Rust it'd just be Option<T>...

@germag
Copy link
Collaborator

germag commented Jul 8, 2024

@cgwalters This seems to behave the same as using os.Stdin on my machine. I don't see any progress bar either way though, so maybe I'm doing something wrong.

When using os.Stdin, the initial ssh connection is busted until sending a newline. This is the error:

ERRO[0110] Failed to write input to service: write unix @->/home/chris/.local/share/containers/podman/machine/qemu/podman.soc
k: write: broken pipe

I didn't see that when I tested it, it's true that the e2e test don't run on my machine, for some reason it got killed by the OOM killer

@germag
Copy link
Collaborator

germag commented Jul 8, 2024

Tangentially this code is IMO another great example of the superficial simplicity of Go being paid down later...trying to use nil for "unset value" except that comes back to bite with "Note that an interface value that holds a nil concrete value is itself non-nil. ". In Rust it'd just be Option<T>...

A famous question asked to the Go team at a conference: "Why did you choose to ignore any research about type systems since the 1970s"

@cgwalters
Copy link
Contributor

I briefly dug at this a bit more, and what I think is going on is again a classic Go problem¹. Basically, the Attach API spawns a bunch of goroutines, such as this one which copies stdin.

In theory, us cancelling the context should terminate all work spawned from the Attach API...but it doesn't, because goroutines are "stackful" and famously not easily cancellable. And so the Attach API just leaks these goroutines.

If one just invokes podman run --rm -ti someimage somecommand (which is what we're replicating in custom code) the basic fact is that when the podman process exits all its goroutines and process state are torn down reliably by the kernel, papering over this leak.

I'm pretty tempted to just switch back to forking off podman run here. It's not clear to me that using the Go API bindings to the HTTP API is buying us a lot of value here.

That said, I think we could fix the podman code here by changing it to force close the socket when the context is cancelled. That's the usual way to cancel reads in Go. It's still fundamentally racy though without going to the extra level of having something like a "cancellation complete" channel in this API that synchronously waits for the worker goroutines to exit though.

¹ Yes, yes it's a poor craftsman who blames their tools and to be honest, asynchronous Rust also has its share of traps - and that trap is very much precisely because of how cancellation of an async task is just a simple "drop"...but, at least it's pretty straightforward to audit Rust codebases for use of select.

@cgwalters
Copy link
Contributor

#61 is an alternative fix

cgwalters added a commit to cgwalters/podman that referenced this pull request Jul 9, 2024
See:

- containers/podman-bootc#59 (comment)
- containers/podman-bootc#61

Sorry for not trying to fix this, but I am not aware of
a remotely straightforward way to do so.

Signed-off-by: Colin Walters <[email protected]>
@germag
Copy link
Collaborator

germag commented Jul 15, 2024

I'm closing this since #61 supersedes it

@germag germag closed this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants